AITopics | recurrent structure

Collaborating Authors

recurrent structure

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d60743aab4b625940d39b3b51c3c6a78-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 10:29:03 GMT

kernel, sequence, string kernel, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

Enhancing Adaptive History Reserving by Spiking Convolutional Block Attention Module in Recurrent Neural Networks Qi Xu1Y uyuan Gao

Neural Information Processing SystemsOct-9-2025, 05:47:18 GMT

The performance of our model was evaluated in DVS128-Gesture dataset and other time-series datasets.

artificial intelligence, information, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

The current era of Natural Language Processing (NLP) is dominated by Transformer models. However, novel architectures relying on recurrent mechanisms, such as xLSTM and Mamba, have been proposed as alternatives to attention-based models. Although computation is done differently than with the attention mechanism mechanism, these recurrent models yield good results and sometimes even outperform state-of-the-art attention-based models. In this work, we propose Distil-xLSTM, an xLSTM-based Small Language Model (SLM) trained by distilling knowledge from a Large Language Model (LLM) that shows promising results while being compute and scale efficient. Our Distil-xLSTM focuses on approximating a transformer-based model attention parametrization using its recurrent sequence mixing components and shows good results with minimal training.

large language model, learning attention mechanism, natural language, (3 more...)

arXiv.org Artificial Intelligence

2503.18565

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

Liger: Linearizing Large Language Models to Gated Recurrent Structures

Lan, Disen, Sun, Weigao, Hu, Jiaxi, Du, Jusen, Cheng, Yu

arXiv.org Artificial IntelligenceMar-3-2025

Transformers with linear recurrent modeling offer linear-time training and constant-memory inference. Despite their demonstrated efficiency and performance, pretraining such non-standard architectures from scratch remains costly and risky. The linearization of large language models (LLMs) transforms pretrained standard models into linear recurrent structures, enabling more efficient deployment. However, current linearization methods typically introduce additional feature map modules that require extensive fine-tuning and overlook the gating mechanisms used in state-of-the-art linear recurrent models. To address these issues, this paper presents Liger, short for Linearizing LLMs to gated recurrent structures. Liger is a novel approach for converting pretrained LLMs into gated linear recurrent models without adding extra parameters. It repurposes the pretrained key matrix weights to construct diverse gating mechanisms, facilitating the formation of various gated recurrent structures while avoiding the need to train additional components from scratch. Using lightweight fine-tuning with Low-Rank Adaptation (LoRA), Liger restores the performance of the linearized gated recurrent models to match that of the original LLMs. Additionally, we introduce Liger Attention, an intra-layer hybrid attention mechanism, which significantly recovers 93\% of the Transformer-based LLM at 0.02\% pre-training tokens during the linearization process, achieving competitive results across multiple benchmarks, as validated on models ranging from 1B to 8B parameters. Code is available at https://github.com/OpenSparseLLMs/Linearization.

architecture, language model, liger, (12 more...)

arXiv.org Artificial Intelligence

2503.01496

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MSegRNN:Enhanced SegRNN Model with Mamba for Long-Term Time Series Forecasting

Zhao, GaoXiang, Wang, XiaoQiang

arXiv.org Artificial IntelligenceJul-15-2024

The field of long-term time series forecasting demands handling extensive look-back windows and long-range prediction steps, posing significant challenges for RNN-based methodologies. Among these, SegRNN, a robust RNN-driven model, has gained considerable attention in LTSF analysis for achieving state-of-the-art results while maintaining a remarkably streamlined architecture. Concurrently, the Mamba structure has demonstrated its advantages in small to medium-sized models due to its capability for information selection. This study introduces a variant of SegRNN that preprocesses information using a fine-tuned single-layer Mamba structure. Additionally, it incorporates implicit segmentation and residual structures into the model's encoding section to further reduce the inherent data iterative cycles of RNN architectures and implicitly integrate inter-channel correlations. This variant, named MSegRNN, utilizes the Mamba structure to select useful information, resulting in a transformed sequence. The linear-strategy-adapted derivative retains the superior memory efficiency of the original SegRNN while demonstrating enhanced performance. Empirical evaluations on real-world LTSF datasets demonstrate the superior performance of our model, thereby contributing to the advancement of LTSF methodologies.

forecasting, information, time sery, (16 more...)

arXiv.org Artificial Intelligence

2407.10768

Country:

Asia > Middle East > Iran (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

Mining Frequent Structures in Conceptual Models

Fumagalli, Mattia, Sales, Tiago Prince, Barcelos, Pedro Paulo F., Micale, Giovanni, Zaytsev, Vadim, Calvanese, Diego, Guizzardi, Giancarlo

arXiv.org Artificial IntelligenceJun-11-2024

The problem of using structured methods to represent knowledge is well-known in conceptual modeling and has been studied for many years. It has been proven that adopting modeling patterns represents an effective structural method. Patterns are, indeed, generalizable recurrent structures that can be exploited as solutions to design problems. They aid in understanding and improving the process of creating models. The undeniable value of using patterns in conceptual modeling was demonstrated in several experimental studies. However, discovering patterns in conceptual models is widely recognized as a highly complex task and a systematic solution to pattern identification is currently lacking. In this paper, we propose a general approach to the problem of discovering frequent structures, as they occur in conceptual modeling languages. As proof of concept for our scientific contribution, we provide an implementation of the approach, by focusing on UML class diagrams, in particular OntoUML models. This implementation comprises an exploratory tool, which, through the combination of a frequent subgraph mining algorithm and graph manipulation techniques, can process multiple conceptual models and discover recurrent structures according to multiple criteria. The primary objective is to offer a support facility for language engineers. This can be employed to leverage both good and bad modeling practices, to evolve and maintain the conceptual modeling language, and to promote the reuse of encoded experience in designing better models with the given language.

conceptual model, graph, modeling language, (14 more...)

arXiv.org Artificial Intelligence

2406.07129

Country:

Europe > Netherlands (0.04)
Europe > Italy (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.94)

Add feedback

Practical Edge Detection via Robust Collaborative Learning

Fu, Yuanbin, Guo, Xiaojie

arXiv.org Artificial IntelligenceAug-27-2023

Edge detection, as a core component in a wide range of visionoriented tasks, is to identify object boundaries and prominent edges in natural images. An edge detector is desired to be both efficient and accurate for practical use. To achieve the goal, two key issues should be concerned: 1) How to liberate deep edge models from inefficient pre-trained backbones that are leveraged by most existing deep learning methods, for saving the computational cost and cutting the model size; and 2) How to mitigate the negative influence from noisy or even wrong labels in training data, which widely exist in edge detection due to the subjectivity and ambiguity of annotators, for the robustness and accuracy. In this paper, we attempt to simultaneously address the above problems via developing a collaborative learning based model, termed PEdger. The principle behind our PEdger is that, the information learned from different training moments and heterogeneous (recurrent and non recurrent in this work) architectures, can be assembled to explore robust knowledge against noisy annotations, even without the help of pre-training on extra data. Extensive ablation studies together with quantitative and qualitative experimental comparisons on the BSDS500 and NYUD datasets are conducted to verify the effectiveness of our design, and demonstrate its superiority over other competitors in terms of accuracy, speed, and model size. Codes can be found at https://github.co/ForawardStar/PEdger.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2308.14084

Country: